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Abstract 

In  mixed- critical  applications,  tension  exists  between  shar¬ 
ing  and  isolation  with  respect  to  hardware  resources:  while 
strong  isolation  might  be  required  for  highly  critical  tasks, 
somewhat  permissive  sharing  might  be  reasonable  for  less 
critical  tasks  to  improve  throughput  or  average-case  per¬ 
formance.  In  this  paper,  this  tension  is  examined  as  it  per¬ 
tains  to  shared  last-level  caches  (LLCs)  on  multicore  plat¬ 
forms.  In  particular,  criticality-aware  optimization  tech¬ 
niques  based  on  linear  programming  are  presented  for  allo¬ 
cating  LLC  areas  in  the  context  of  the  previously  proposed 
MC^  (mixed-criticality  on  multicore)  framework.  Experi¬ 
ments  are  also  presented  that  show  that  these  techniques 
can  result  in  significant  schedulability  improvements. 

1  Introduction 

The  adoption  of  multicore  machines  in  safety-critical  do¬ 
mains  is  being  hampered  by  aspects  of  such  machines  that 
reflect  a  throughput-oriented  design  philosophy.  For  exam¬ 
ple,  it  is  common  practice  today  to  allow  hardware  com¬ 
ponents  such  as  last-level  caches  (LLCs)  and  memory  con¬ 
trollers  to  be  shared  across  cores;  this  can  be  beneficial  as 
long  as  any  detrimental  effects  due  to  sharing  are  not  typi¬ 
cally  seen  on  average.  Unfortunately,  such  sharing  can  result 
in  timing  behaviors  that  are  exceedingly  difficult  to  charac¬ 
terize  in  the  worst  case  without  excessive  pessimism.  This 
is  problematic  for  safety-critical  domains,  where  correct  ex¬ 
ecution  must  be  validated  even  in  worst-case  scenarios. 

Excessive  pessimism  due  to  shared  hardware  is  a  key 
contributing  factor  to  a  problem  termed  here  the  “one-out- 
of-m”  problem:  when  checking  real-time  constraints  on  a 
multicore  platform  with  m  cores,  analysis  pessimism  can 
easily  negate  the  processing  capacity  of  the  additional  m  —  1 
cores.  In  effect,  only  “one  core’s  worth”  of  capacity  can  be 
utilized  even  though  m  cores  are  available.  In  domains  such 
as  avionics,  this  problem  has  led  to  the  common  practice  of 
simply  disabling  all  but  one  core.^  This  problem  is  the  most 
serious  unresolved  obstacle  in  work  on  real-time  multicore 
resource  allocation  today. 

The  desire  to  reduce  the  pessimism  caused  by  unman¬ 
aged  shared  hardware  has  led  to  intense  recent  interest  in 
hardware  management  techniques  [1,  9,  10,  11,  14,  18,  19]. 
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Tn  fact,  the  U.S.  Federal  Aviation  Administration  is  currently  consid¬ 
ering  the  possibility  of  mandating  such  an  approach  when  multicore  plat¬ 
forms  are  used  in  avionic  systems. 


A  common  goal  here  is  to  provide  isolation  by  partitioning 
hardware  resources  among  cores  and/or  tasks  to  eliminate 
sharing  altogether.  However,  this  can  be  an  overly  strong  so¬ 
lution  in  many  contexts:  even  safety-critical  applications  of¬ 
ten  have  system  components  that  are  not  highly  critical  and 
that  could  therefore  benefit  from  less  constrained  sharing. 
A  better  way  forward  might  be  to  achieve  some  appropriate 
balance  between  sharing  and  isolation  based  on  the  critical¬ 
ities  of  the  software  components  involved.  In  this  paper,  we 
investigate  this  issue  of  balance  as  it  relates  to  shared  LLCs. 

Mixed-criticality  systems.  Our  work  fits  within  the  larger 
body  of  research  on  mixed-criticality  (MC)  resource  allo¬ 
cation  spawned  by  a  seminal  paper  of  Vestal  [17].  He  pro¬ 
posed  analyzing  the  real-time  requirements  of  less  critical 
tasks  under  less  pessimistic  analysis  assumptions.  Specifi¬ 
cally,  to  analyze  a  system  with  L  criticality  levels,  he  pro¬ 
posed  specifying  a  provisioned  execution  time  (PET)  for 
each  task  at  every  level  and  analyzing  L  different  system 
variants:  in  the  Level-^  variant,  the  real-time  requirements 
of  all  Level-^  tasks  are  verified  with  Level-^  PETs  assumed 
for  all  tasks  (at  any  level).  The  degree  of  pessimism  in  de¬ 
termining  PETs  is  level-dependent:  if  Level  I  is  of  higher 
criticality  than  Level  C ,  then  Level-^  PETs  will  generally 
be  greater  than  Level-^'  PETs.  Vestal’s  work  led  to  approx¬ 
imately  200  follow-up  papers  on  MC  scheduling  by  a  va¬ 
riety  of  authors.  An  excellent  survey  of  this  work  has  been 
prepared  by  Davis  and  Burns  [4].  They  note  that  the  funda¬ 
mental  research  question  in  this  area  as  “reconciling]  the 
conflicting  requirements  of  partitioning  for  (safety)  assur¬ 
ance  and  sharing  for  efficient  resource  usage.”  This  is  the 
very  issue  investigated  herein  (as  it  relates  to  shared  LLCs). 

Cache  partitioning.  Under  cache  partitioning,  designated 
cache  areas  are  assigned  to  certain  tasks,  sets  of  tasks, 
or  cores.  Assuming  a  set  associative  cache,  this  can  be 
achieved  through  some  combination  of  page  coloring,  to 
provide  set-based  partitioning,  or  the  use  of  hardware  sup¬ 
port  in  the  form  of  lockdown  registers,  to  provide  way-based 
partitioning.  These  alternatives  are  illustrated  in  Eig.  1  with 
respect  to  a  quad-core  ARM  Cortex  A9  machine,  which  is 
the  canonical  hardware  platform  considered  herein.  As  seen 
in  inset  (a),  each  core  on  this  machine  has  a  lockdown  reg¬ 
ister,  the  bits  of  which  can  be  cleared  to  steer  LLC  accesses 
from  this  core  to  certain  ways  of  the  LLC.  Under  page  color¬ 
ing,  pages  of  physical  memory  are  assigned  colors,  and  sets 
of  the  LLC  are  colored  corresponding  to  how  such  pages 
map  to  them.  As  seen  in  inset  (b),  this  technique  ensures 
that  differently  colored  pages  cannot  cause  conflicts  in  the 
LLC. 
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(a)  Way -based  partitioning. 


(b)  Set-based  partitioning. 
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Figure  1 :  Allocating  LLC  areas  by  way,  or  set,  or  both.  On  this 
machine,  the  LLC  is  an  L2  cache  shared  by  four  cores. 

MC^.  Our  examination  of  LLC  allocation  tradeoffs  in 
MC  systems  is  based  upon  the  MC^  (mixed-criticality  on 
multicore)  framework  [8,  16,  18],  which  has  been  the  sub¬ 
ject  of  continuing  research  by  our  group. ^  In  MC^,  four 
criticality  levels  exist,  denoted  A  (highest)  through  D  (low¬ 
est),  as  shown  in  Fig.  2.  Higher-criticality  tasks  are  stati¬ 
cally  prioritized  over  lower-criticality  ones.  Level- A  tasks 
are  partitioned  and  scheduled  on  each  core  using  a  time- 
triggered  table-driven  cyclic  executive.^  Level-B  tasks  are 
also  partitioned  but  are  scheduled  using  a  rate-monotonic 
(RM)  scheduler  on  each  core.^  On  each  core,  the  Level-A 
and  -B  tasks  are  required  to  be  simply  periodic  (all  tasks 
commence  execution  at  time  0  and  periods  are  harmonic), 
with  the  Level-B  task  periods  being  integer  multiples  of 
the  Level-A  hyper-period.  These  tasks  have  hard  real-time 
(HRT)  constraints.  Level-C  tasks  have  soft  real-time  (SRT) 
constraints  and  are  scheduled  via  a  global  earliest-deadline- 
first  (G-EDF)  scheduler;^  the  considered  SRT  constraint  is 
that  deadline  tardiness  is  provably  bounded.  Level-D  tasks 

should  not  be  confused  with  a  similarly  named  European  project 
that  began  several  years  after  work  on  MC^  commenced. 

^A  RM  (EDF)  scheduler  can  be  optionally  used  at  Level  A  (B).  Ad¬ 
ditionally,  any  G-EDF-like  (GEL)  scheduler  [7]  can  be  used  at  Level  C. 
Furthermore,  Level-C  tasks  can  be  defined  according  to  the  sporadic  task 
model.  For  simplicity,  we  do  not  consider  these  options  further  herein. 
Other  facets  of  MC^,  such  as  slack  reallocation,  schedulability  conditions, 
and  execution-time  budgeting  are  discussed  in  prior  papers  [8,  16,  18]. 
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Figure  2:  Scheduling  in  MC^  on  a  quad-core  machine. 

are  scheduled  with  no  real-time  guarantees  (so  we  do  not 
consider  them  further).  MC^  is  a  flexible  framework  from  a 
research  point-of-view.  For  example,  it  can  be  configured  to 
have  only  two  HRT  criticality  levels  (as  in  most  theoretical 
work  on  MC  scheduling)  or  to  fully  assign  the  Level-A  and 
-B  subsystems  to  distinct,  dedicated  cores. 

In  recent  work,  we  extended  MC  to  support  partitioning 
with  respect  to  the  LLC  and  DRAM  banks  and  to  isolate  the 
operating  system  from  application  tasks.  In  a  companion  pa¬ 
per  [12],  we  describe  how  these  features  were  implemented 
and  present  experiments  that  demonstrate  the  virtues  of  the 
supported  isolation  mechanisms  in  MC  systems.  In  that  ef¬ 
fort,  we  considered  a  single  generic  LLC  allocation  strategy. 

Contributions.  In  this  paper,  we  consider  the  problem  of 
optimizing  LLC  allocations  in  the  context  of  MC^  for  a  spe¬ 
cific  task  system.  We  consider  a  general  criticality-aware 
LLC  allocation  framework  that  allows  leeway  in  precisely 
determining  allocated  LLC  areas  for  a  specific  task  system. 
We  study  the  problem  of  determining  such  allocations  for¬ 
mally.  We  first  discuss  how  to  model  the  impacts  of  a  given 
allocation  strategy  on  LLC -related  overheads  and  task  exe¬ 
cution  times.  We  then  adopt  a  particular  model  that  allows 
us  to  determine  LLC  allocations  by  solving  a  linear  pro¬ 
gram  (LP).  To  analyze  the  effectiveness  of  this  approach, 
we  present  a  schedulability  study  involving  randomly  gen¬ 
erated  task  systems  where  generated  task  execution  times 
were  based  on  measurement  data.  In  this  study,  the  usage 
of  our  techniques  enabled  schedulability  improvements  of 
up  to  100%  for  some  task-system  categories  in  comparison 
to  two  generic  task- system-oblivious  LLC  allocation  strate¬ 
gies,  including  that  considered  in  [12].  The  presented  LP 
can  be  solved  as  either  an  ordinary  LP  or  a  mixed-integer  LP 
(MILP).  In  our  experiments,  both  variants  exhibited  simi¬ 
lar  runtime  performance,  both  often  yielded  nearly  identical 
schedulability  results,  but  for  some  task  systems,  schedula¬ 
bility  was  noticeably  better  under  the  MILP  variant. 

To  our  knowledge,  LLC  allocation  strategies  for  MC  sys¬ 
tems  have  not  been  considered  before,  particularly  in  a  con¬ 
text  with  as  many  interesting  tradeoffs  as  MC^.  Nonethe¬ 
less,  there  has  been  much  prior  work  on  cache  partitioning. 
We  review  this  work  later  in  Sec.  2  to  more  properly  posi¬ 
tion  our  contributions. 

Organization.  In  the  remainder  of  the  paper,  we  provide 
relevant  background  (Sec.  2),  describe  our  LLC  allocation 
techniques  (Secs.  3  and  4),  present  our  experimental  evalu¬ 
ation  (Sec.  5),  and  conclude  (Sec.  6). 
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2  Background 


In  this  section,  we  present  relevant  notation,  formally  define 
the  problem  solved  in  this  paper,  and  discuss  related  work. 


Task  system  notation.  We  consider  a  set  of  implicit- 
deadline  periodic  tasks  F  =  {ri ,  r2 ,  rs , . . . ,  tat  }  to  be  sched¬ 
uled  under  the  MC^  framework  on  m  cores.  We  only  con¬ 
sider  Levels  A-C  in  MC^,  as  Level  D  is  non-real-time.  Each 
task  Ti  has  a  period  T^,  and  three  PETs,  ef,  ef ,  and  ef , 
where  e\  denotes  its  Level-^  PET  (recall  the  discussion  con¬ 
cerning  MC  schedulability  analysis  in  Sec.  1).  We  let  F^, 
Fb,  and  Fc  denote  the  subset  of  tasks  in  F  at  Levels  A,  B, 
and  C,  respectively.  Also,  we  let  ^ a, ^  and  F B,p  denote  the 
subset  of  tasks  in  F^  and  F b,  respectively,  that  are  assigned 
to  core  p.  We  denote  the  total  utilization  of  all  Level-^  tasks 


assuming  Level-^'  execution  times  as 

We  denote  the  total  utilization  of  all  Level-A  or  -B  tasks 

assigned  to  core  p  assuming  Level-^'  execution  times  as 

!  l'  !  i' 

U^p  =  Er.er^,,  %  and  ^ 

tively.  The  schedulability  condition  for  Level  C  is  depen¬ 
dent  on  the  largest  utilization  of  any  task  at  Level  C,  which 
we  denote  as  h,  and  the  sum  of  the  m  —  1  largest  task  utiliza¬ 
tions  at  Level  C,  which  we  denote  as  H.  The  following  are 
sufficient  conditions  for  ensuring  schedulability  at  all  three 
criticality  levels  [16]. 


Vp  Uij,  <  1  A 

(1) 

Vp  Ul^  +  UE,p  <  1  A 

(2) 

UE  +  UE  +  US  <mA 

(3) 

+  UE  +  {m  —  l)h  +  H  <  m 

(4) 

We  assume  that  PETs  are  determined  through  a  measure¬ 
ment  process,  as  often  done  in  practice  (indeed,  on  multi¬ 
core  platforms  adequate  static  timing-analysis  tools  do  not 
yet  exist).  Specifically,  we  assume  that  Level-C  PETs  re- 
fiect  measured  average-case  execution  times'^  (since  Level 
C  is  SRT)  and  that  Level-B  PETs  refiect  measured  worst- 
case  execution  times  (since  Level  B  is  HRT).  Eurther,  we 
assume  that  Level-A  PETs  are  defined  by  infiating  Level- 
B  PETs  by  50%  (since  Level  A  is  of  highest  criticality). 
Such  an  inflation  is  in  keeping  with  infiation  factors  de¬ 
rived  from  industrial  use  cases  considered  by  Vestal  [17]. 
These  measurement-based  PETs  will  generally  depend  on 
allocated  LLC  areas. ^  We  denote  the  Level-^  PET  of  task 
Ti  when  its  allocated  LLC  area  consists  of  W  ways  and  S 
colors  (refer  to  Eig.  1)  as  ef  (IE,  S).  (We  use  “S'”  in  denoting 
colors  because  colors  determine  LLC  sets,  and  the  term  “C” 
has  a  predefined  meaning  in  the  context  of  MC  .) 

^In  MC^,  a  Level-^  task’s  Level-^  PET  is  treated  as  an  enforced  exe¬ 
cution  budget.  As  explained  in  [15],  tardiness  bounds  with  respect  to  de¬ 
terministic  budget  allocations  at  Level  C  can  be  used  to  bound  tardiness  in 
expectation  when  average-case  task  execution  times  are  assumed. 

^We  often  use  the  term  “area”  instead  of  “partition”  because  we  allow 
for  the  possibility  that  such  areas  overlap. 
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Eigure  3:  Canonical  LLC  allocation. 

Canonical  LLC  allocation  and  problem  to  be  solved.  We 

consider  a  canonical  LLC  allocation,  which  is  illustrated  in 
Eig.  3  with  respect  to  our  quad-core  ARM  platform,  the 
LLC  of  which  has  16  colors  and  16  ways.  Assuming  an 
LLC  with  ways  in  total,  all  Level-C  tasks  together 

are  allocated  an  LLC  area  that  consists  of  all  colors  (sets) 
associated  with  ways  Wc  through  —  1  for  some  Wc- 

All  LLC  areas  for  Level-A  and  -B  tasks  are  taken  from  the 
colors  (sets)  associated  with  ways  0  through  Wc  —  1-  The 
Level-A  and  -B  tasks  on  each  core  use  an  LLC  area  consist¬ 
ing  of  1  /m  (m  =  4  on  our  platform)  of  the  colors  (sets)  as¬ 
sociated  with  these  ways,  as  depicted.  Each  per-core  Level- 
A  and  -B  LLC  area  is  subdivided  into  potentially  overlap¬ 
ping  Level-A  and  -B  areas.  This  allocation  scheme  provides 
the  following  notions  of  spatial  and  temporal  isolation  with 
respect  to  the  LLC  {spatial  isolation  is  guaranteed  when  ac¬ 
cess  to  common  LLC  areas  is  categorically  prevented,  and 
temporal  isolation  is  guaranteed  when  a  task’s  lines  in  a 
common  LLC  area  cannot  be  evicted  while  it  is  using  them). 

•  Level-C  tasks  are  spatially  isolated  from  Level-A  and 
-B  tasks. 

•  Level-A  and  -B  tasks  on  one  core  are  spatially  isolated 
from  Level-A  and  -B  tasks  on  other  cores. 

•  Level-A  and  -B  tasks  on  the  same  core  are  spatially 
isolated  with  respect  to  the  ways  that  they  do  not 
share.  Additionally,  Level-A  tasks  are  temporally  iso¬ 
lated  from  Level-B  tasks  with  respect  to  the  ways  they 
share  because  Level-A  tasks  have  higher  priority. 

This  general  allocation  strategy  reflects  two  assumptions: 
Level-C  tasks,  being  SRT  and  provisioned  on  the  average 
case,  might  benefit  from  rather  unrestricted  LLC  sharing; 
Level-A  and  -B  tasks,  being  HRT  and  more  critical,  might 
require  stronger  LLC  isolation  guarantees.  With  regard  to 
the  latter,  note  that  the  set  of  Level-A  tasks  on  one  core  is 
completely  isolated  (either  spatially  or  temporally)  from  all 
other  tasks  in  the  system  with  respect  to  the  LLC. 

The  technical  problem  considered  in  this  paper  is  to  de¬ 
termine  how  to  precisely  size  these  LLC  areas  so  as  to  en¬ 
hance  schedulability  given  the  characteristics  of  the  task 
system  in  question.  That  is,  we  seek  to  determine  how  the 
bold  lines  in  Eig.  3  should  be  set.  In  addressing  this  prob¬ 
lem,  we  assume  that  an  assignment  of  all  Level-A  and  -B 
tasks  to  cores  has  already  been  determined. 
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(a)  Task-centric  accounting. 


(b)  Preemption-centric  accounting. 


Figure  4:  Forms  of  overhead  accounting. 

Overhead  accounting.  Depending  on  how  task  systems 
are  analyzed,  execution  times  and  schedulability  conditions 
may  not  include  the  impact  of  system  overheads.  In  this 
paper,  we  consider  one  such  overhead,  cache-related  pre¬ 
emption  delays  {CRPDs),  and  how  this  overhead  is  affected 
by  our  LLC  allocation  methods.  CRPDs  are  delays  a  task 
may  incur  to  reload  lines  evicted  from  the  LLC  (and  other 
caches)  due  to  a  preemption.  We  discuss  how  to  quantify 
CRPDs  with  respect  to  LLC  allocation  sizes,  so  that  these 
delays  can  be  integrated  into  schedulability  analysis. 

There  are  two  basic  ways  to  account  for  CRPD  costs, 
as  shown  in  Fig.  4.  Under  task-centric  accounting,  the  ex¬ 
ecution  time  of  the  preempted  job  is  inflated  to  account  for 
the  preemption.  Under  preemption-centric  accounting,  the 
execution  time  of  the  preempting  job  is  inflated  to  “pay” 
for  the  CPRD  cost  of  any  preempted  job  that  resumes  ex¬ 
ecution  when  the  preempting  job  completes.  We  consider 
preemption-centric  accounting  here,  because  it  usually  in¬ 
troduces  less  pessimism  in  schedulability  analysis,  and  be¬ 
cause  it  can  be  linearly  modeled  by  simply  adding  an  infla¬ 
tion  term  to  each  execution  cost.  (Task-centric  accounting 
entails  the  introduction  of  non-linear  ceiling  and/or  floor  op¬ 
erators.) 

Related  work.  Having  fully  specifled  the  problem  to  be 
solved  in  this  paper,  and  some  of  the  assumptions  we  make 
in  solving  it,  we  now  discuss  related  work. 

The  use  of  cache  partitioning  in  real-time  systems  has 
been  investigated  before.  A  good  overview  of  early  work 
on  this  topic  has  been  given  by  Kirk  [13].  In  more  re¬ 
cent  work,  Kim  et  al.  [11]  presented  a  cache-partitioning 
scheme  that  allows  multiple  tasks  to  share  the  same  cache 
partition  on  a  single  processor  (as  we  do  for  Level-A  and 
-B  tasks),  but  they  did  not  consider  MC  systems.  Altmeyer 
et  al.  [1]  considered  uniprocessor  scheduling  on  a  system 
with  a  direct-mapped  cache  and  examined  worst-case  exe¬ 
cution  time  (WCET)  estimates  as  a  function  of  cache  size. 


They  also  presented  a  cache-partitioning  algorithm  that  is 
optimal  under  certain  cache-modeling  assumptions.  As  an 
alternative  to  cache  partitioning,  a  technique  called  cache 
lockdown  can  be  used  that  prevents  designated  cached  data 
or  instructions  from  being  evicted  [5].  Also,  it  is  possible  to 
redesign  the  cache  allocator  itself  to  provide  a  replacement 
policy  that  enables  greater  predictability  [9] . 

3  MC^  LLC-Managed  Overhead  Accounting 

In  Sec.  2,  we  discussed  preemption-centric  CRPD  account¬ 
ing.  In  this  section,  we  discuss  our  methods  for  determining 
required  overhead  inflations  for  task  execution  times  under 
a  managed  LLC.  These  methods  for  overhead  accounting 
ensure  task  execution-time  properties  used  by  our  LP  pro¬ 
grams,  discussed  in  Sec.  4,  hold  for  both  inflated  and  non- 
inflated  execution  times. 

The  inflation  term  we  add  is  generally  a  function  of  a 
task’s  allocated  LLC  area  size.  For  example,  we  can  inflate 
the  Level-^  execution  time  of  any  Level-B  or  -C  task  that  has 
an  LLC  area  consisting  of  W  ways  and  S  colors  as  follows: 

Vi  ;  Ti  e  Tb  u  Tc  S)  =  ef(W",  S)  +  E\W,  S), 

(5) 

where  E^{W,S)  is  the  time  required,  according  to  Level-^ 
analysis,  to  reload  ah  cache  lines  within  a  region  of  the  LLC 
consisting  of  W  ways  and  S  colors.  Note  that  this  is  the 
LLC  area  of  both  the  preempting  and  preempted  task:  for 
preemptions  of  Level-B  tasks  by  Level-B  tasks  (or  Level-C 
tasks  by  Level-C  tasks),  the  preempting  job  shares  the  same 
LLC  area  as  the  preempted  job.  We  assume  a  constant  time 

is  required  under  Level-^  analysis  assumptions  to  load 
the  lines  within  an  LLC  area  consisting  of  only  one  way 
and  one  color.  Under  this  assumption,  our  inflation  term  is 

E\W,  S)  =  W-S-b‘^.  (6) 

We  now  explain  how  to  introduce  inflations  into  the 
schedulability  conditions  (l)-(4)  discussed  earlier.  To  do  so, 
we  can  substitute  for  each  utilization  term  a  corresponding 
inflated  utilization  term.  We  denote  the  inflated  Level-^  uti- 

£  e'^ 

lizations  of  each  task  Ti  SiS  u  ^  We  can  then  deflne 

inflated  Level-B  and  -C  utilizations  as  follows. 

B  ,p 

Tiers, p 
TTfC  _  ,c 

TiGrc 

We  also  replace  h  and  H  in  condition  (4)  with  inflated  terms 
h'  and  H' .  h'  is  the  highest  inflated  Level-C  utilization  of 
any  Level-C  task,  and  H'  is  the  sum  of  the  m  —  1  highest 
inflated  Level-C  utilizations  at  Level  C. 

Note  that  we  do  not  apply  the  inflation  described  in 


4 


f 

I  release 


frame  boundary 


I  inflation 


B 

Ty 


E 


Figure  5 :  Per-frame  Level-B  inflation  for  Level  A. 


Equation  (5)  to  Level-A  jobs.  That  is,  we  have 
Vi  :  Ti  G  ::  e'-(W,  S)  =  ef{W,  S), 
yp  ::  U%  =  Ul^. 

Under  the  cyclic-executive  model  [2],  scheduling  is  based 
on  fixed-length  frames.  Each  Level-A  job  runs  non- 
preemptively  within  a  frame  unless  it  is  sliced.  Job  slicing 
allocates  different  portions  of  a  job  (job  slices)  to  different 
frames.  We  assume  the  execution  time  of  each  job  slice  is 
measured  independently  of  other  slices  when  PETs  are  ini¬ 
tially  determined.  This  ensures  the  PETs  of  one  slice  are  not 
affected  by  cache  lines  loaded  by  other  job  slices. 

Level-A  jobs  may  still  produce  other  CRPD  overheads. 
In  the  event  that  the  LLC  areas  for  Levels  A  and  B  on  core 
p  overlap,  the  Level-A  tasks  on  core  p  may  evict  all  of  the 
cache  lines  of  Level-B  tasks  within  the  overlap.  This  might 
suggest  that  the  Level-B  execution  time  of  each  Level-A 
job  requires  infiation.  However,  the  required  infiation  can  be 
less  pessimistically  determined.  As  shown  in  Eig.  5,  Level- 
A  jobs  allocated  to  a  frame  /  run  sequentially  at  the  begin¬ 
ning  of  each  frame.  In  this  scenario.  Level  B  is  only  pre¬ 
empted  by  Level  A  at  most  once  per  frame. 

Depending  on  the  replacement  policy  of  the  cache,  evic¬ 
tions  by  Level-A  tasks  within  overlapping  sets  may  cause 
Level-B  tasks  to  evict  additional  lines  throughout  the  ways 
allocated  to  Level-B  in  the  overlapping  sets.  Eor  Level  B,  we 
make  the  pessimistic  assumption  that  the  number  of  evic¬ 
tions  directly  or  indirectly  caused  by  a  Level-A  task  is  equal 
to  the  area  allocated  to  Level  B  in  sets  it  shares  with  Level  A. 
Eor  Level-C,  we  make  the  more  optimistic  assumption  that, 
on  average,  the  number  of  evicted  cache  blocks  is  equal  to 
the  size  of  the  overlap. 

The  frame  size  for  the  cyclic  executive  of  Level-A  tasks 
on  core  p  is  equal  to  the  smallest  period  of  any  task  in  Ta,p, 
which  we  denote  If  the  overlap  on  core  p  consists 

of  Wp  ways  and  colors  and  Level-B  is  allocated  Wb,p 

ways,  we  can  model  the  overhead  associated  with  reloading 
cache  lines  allocated  to  Level-B  once  per  frame  by  infiating 
the  Level-B  and  -C  utilizations  of  Level  A. 


Vp  ::  U'l^ 
Vp  :: 


UZp  + 


EB{Wb,p,S^) 

n~>min 

A,p 

eC{wo,s?) 

'Bmin 

A,p 


Our  schedulability  conditions  with  CRPD  overheads  ac¬ 
counted  for  are  the  following. 


Vp  ::  <  1 

(7) 

Vp  ::  +  U'^B,p  <  1 

(8) 

m 

E  f'A.P  +  ^'b,p)  +  V'g  <  m 

P=1 

(9) 

m 

E' 

{u'a,p  +  +  (m  -  l)h'  +H'  <m 

(10) 

p=i 


4  Linear  Programming 

In  this  section,  we  show  how  to  solve  the  canonical  LLC 
allocation  problem  described  in  Sec.  2  via  a  linear  program 
(LP).  The  LP  we  obtain  determines  a  choice  of  ways  for 
each  allocated  LLC  area  such  that  the  schedulability  con¬ 
ditions  (7)-(10)  are  maintained.  This  requires  treating  ways 
as  continuous  variables.  We  explain  later  how  to  ultimately 
obtain  an  integral  solution.  We  now  describe  the  various  sets 
of  constraints  in  our  final  LP. 

LLC  size  constraints.  The  simplest  constraint  set  ensures 
that  there  is  no  overlap  between  Level-C ’s  partition  and  any 
allocated  LLC  areas  at  higher  criticality  levels.  We  let  Wa,p 
and  Wb,p  denote  continuous  LP  variables  indicating  the 
number  of  ways  allocated  to  Levels  A  and  B,  respectively, 
on  core  p.  We  let  Wc  denote  a  continuous  LP  variable  indi¬ 
cating  the  number  of  ways  allocated  to  Level  C. 

LLC  size  constraints  also  determine  the  overlap  between 
Levels- A  and  -B  LLC  areas.  Wp  denotes  a  continuous  LP 
variable  modeling  the  overlap  on  core  p.  Recall  that 
is  the  total  number  of  ways  in  the  considered  LLC  cache.  If 
Level-A  and  -B  LLC  areas  overlap  on  core  p,  the  overlap  is 

=  Wa,p  +  Wb,p  +  Wc- 

Constraint  Set  1.  The  LLC  size  constraints  are  as  follows. 
Vp  ::  Wa,p  +  Wc  < 


Vp  ::  Wb,p  +  Wc  < 

>  Wa,p  +  Wb,p  +  Wc- 

Modeling  execution  times.  The  manner  in  which  we 
model  the  impact  of  allocated  LLC  area  sizes  on  execution 
times  affects  the  choice  of  algorithms  that  can  be  applied  to 
determine  such  sizes.  Without  a  clear  relationship  between 
execution  times  and  area  sizes,  there  may  be  no  way  to  de¬ 
termine  how  adjustments  to  such  sizes  impact  schedulability 
except  through  brute  force  trial  and  error.  Given  the  man- 
ner  in  which  tasks  are  prioritized  in  MC  ,  and  the  canoni¬ 
cal  LLC  allocation  framework  described  above,  we  require 
both  worst-  and  average-case  execution-time  measurements 
of  Level- A  and  -B  tasks,  and  average-case  measurements 
for  Level-C  tasks.  The  Level- A  and  -B  measurements  may 
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be  taken  in  a  system  under  load  but  with  isolation  provided 
with  respect  to  the  LLC,  as  described  above.  The  Level- 
C  measurements  also  need  to  be  taken  in  a  system  under 
load  to  account  for  the  impact  of  concurrent  evictions  and 
memory-bus  contention  at  Level-C  (although  it  may  be  ap¬ 
propriate  to  obtain  average-case  measurements  under  a  less 
heavy  load  than  for  worst-case  measurements). 

Such  execution-time  measurements  often  exhibit  a  prop¬ 
erty  we  will  exploit: 

Execution  Time  Assumption.  The  derivative  of  a  task's  ex¬ 
ecution  time  (at  any  level)  with  respect  to  its  allocated  LLC 
area  size  is  non-increasing.  That  is,  the  execution  time  func¬ 
tion  is  non-convex. 

Bui  et  al.  [3]  presented  graphs  for  execution  times  of  sev¬ 
eral  avionics  applications  that  approximately  meet  this  con¬ 
dition,  suggesting  that  this  behavior  is  not  uncommon.  Our 
measurements  for  several  benchmark  programs  on  our  Cor¬ 
tex  A9  platform  exhibit  similar  behavior  [12].  We  note  three 
properties  that  directly  follow  from  this  assumption. 

Lemma  1.  The  derivative  of  a  task's  inflated  execution  time 
(at  any  level)  with  respect  to  its  allocated  LLC  area  size  is 
non-increasing. 

Proof:  For  our  LLC  allocation  problem,  colors  are  fixed 
at  each  level,  such  that  the  execution  time  function  for  each 
task  Ti  is  a  function  over  the  number  of  ways  allocated  to 
Ti.  By  (6),  the  inflation  function  varies  linearly  with  al¬ 
located  ways,  and  is  thus  non-convex.  The  sum  of  two  non- 
convex  functions  is  non-convex.  □ 

Lemma  2.  The  derivative  of  a  task's  inflated  utilization  (at 
any  level)  with  respect  to  its  allocated  LLC  area  size  is  non¬ 
increasing. 

Proof:  This  follows  from  the  fact  that  task  utilizations  are 
directly  proportional  to  task  execution  times.  □ 

We  could  proceed  with  the  construction  of  our  LP  by 
treating  individual  task  utilizations  as  variables,  but  this 
would  entail  having  0{N)  variables.  We  can  limit  the  num¬ 
ber  of  variables  to  0{m)  by  instead  considering  the  com¬ 
bined  utilizations  of  sets  of  tasks.  This  is  supported  by  one 
final  property. 

Lemma  3.  The  derivative  of  the  inflated  utilizations  (at  any 
level)  of  a  set  of  tasks  with  respect  to  their  allocated  LLC 
area  size  is  non-increasing. 

Proof:  As  stated  earlier,  the  sum  of  non-convex  functions 
is  non-convex.  □ 

While  some  of  the  assumptions  made  here  concerning 
execution  times  may  result  in  over-approximations  of  such 
execution  times  so  that  these  assumptions  are  met,  we  show 
later  via  a  schedulability  study  that  our  LLC  allocation 
methods  yield  substantial  schedulability  improvements. 

PET-  and  overhead-based  constraints.  Consider  the  hy¬ 
pothetical  utilization  plot  shown  in  Fig.  6  for  U'^  with  re¬ 
spect  to  some  integer  number  of  allocated  LLC  ways  W. 
We  can  construct  such  a  plot  from  execution-time  measure¬ 
ments,  known  task  periods,  and  known  values  for  and 

.  In  Fig.  6,  we  create  a  set  of  lines  from  each  pair  of  adja¬ 
cent  data  points,  using  the  standard  two-point  line  formula 


I*  data  point _ -  -  constraint  | 


t]  1  3  3  S 

Niinnher  Otways 


Figure  6:  Converting  utilizations  derived  from  execution  time 
data  to  linear  constraints.  The  shaded  region  is  the  continuous  re¬ 
gion  in  which  utilization  will  be  constrained  in  our  LP. 

f{x)  =  f{xo)  +  {x  -  xo){f{xo  +  1)  -  f{xo))-  This  is 
the  formula  for  the  line  that  contains  the  points  f{xo  -f  1) 
and  f{xo).  We  can  describe  the  value  of  /  over  a  continu¬ 
ous  domain  with  LP  variables  /  and  x  constrained  by  such 
lines.  Let  denote  the  maximum  value  of  x  for  which 
we  have  a  data  point  for  /(x).  A  value  can  be  determined 
for  /  by  solving  the  following  LP. 

minimize  / 
subject  to: 

/  >  f{x)  +  {x-  x)(f(x  +  1)  -  fix)) 

/>0 

0  <  X  <  x^^^ 

If  this  LP  produces  an  integer  value  for  x,  then  /  will 
equal  f{x).  In  the  case  considered  in  Fig.  6,  our  discrete 
function  is  U'q{W),  for  which  we  define  the  LP  variable 
Uq  to  describe  U' c{W)  over  a  continuous  domain. 

lie  >  U'ciW)  +  (Wc  -  W)iU'ciW  +  1)  -  U'ciW)) 

Note  that  Wc  is  the  only  variable  in  the  right-hand- side  ex¬ 
pression  above,  i.e.,  this  is  a  linear  expression.  We  define 
similar  LP  variables  ^B,p 

the  infiated  utilizations  of  Levels  A  and  B  for  each  core  p. 
To  simplify  the  constraints  presented  for  these  variables,  we 
introduce  the  following  shorthand  functions  for  lines  con¬ 
structed  from  data  points  for  utilizations. 

K,piWA,p,w)  = 

UipiW)  +  iWA,r,  -  W)iuy^iw  +  1)  -  uy^iw)) 

W(Wb,p,W)  = 

U'bAW)  +  iWB,p  -  W)iU'^sAW  +  1)  -  C^bA^)) 

vSiWc,W)  = 

U'ciW)  +  iWc  -  W)iU'ciW  +  1)  -  U'ciW)) 
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Note  that  W)  and  lJ^{Wc,  W)  depend  on  in¬ 

flated  utilizations,  while  ^A,pi^A,p,  W)  does  not.  This  is 
because  of  the  different  manner  in  which  CRPDs  are  dealt 
with  at  Level  A,  as  discussed  earlier. 

To  handle  Level- A  inflations,  we  incorporate  them  into 
the  constraints  for  utilization  variables  separately  from  data 
points,  as  shown  in  the  following  constraint  set.  Letting 
gmax  (ienote  the  total  number  of  colors  of  the  considered 
LLC  =  16  on  our  ARM  platform),  we  let  and 

I^  p  denote  the  needed  Level-A  inflations  on  core  p  at  Lev¬ 
els  B  and  C,  respectively,  with  respect  to  our  LP  variables 
for  ways. 


then  the  non-convexity  of  h'{W)  follows  trivially,  because 
the  utilizaton  of  is  non-convex.  Non-convexity  still  holds 
if  Th{W)  changes  with  W.  Consider  way  values  W  and 
IL  +  1  such  that  rh{W)  7^  rh{W  +  1).  This  implies  that 
the  derivative  of  +  l)’s  utilization  is  greater  than  the 

derivative  of  Th{Wys  utilization  at  W.  Hence,  the  deriva¬ 
tive  of  h'  is  greater  at  IL  +  1  than  W,  and  h  remains  non- 
convex  at  IL  +  1.  By  similar  logic,  H'{W)  is  guaranteed  to 
be  non-convex. 

Constraint  Set  3,  The  linear  constraints  for  h  and  H  based 
on  measured  task-set  utilizations  with  CRPD  overheads  are 
as  follows. 


^  _E^{WB,p,S^^ym) 


= 

^A,p  — 


tC  = 

^A,p  — 


'Bmin 

Ap 


E^{W^  jm) 


n^min 

-‘-A,p 


gmax  I ^  gives  the  total  number  of  colors  allocated  to  Lev¬ 
els  A  and  B  on  each  core.  Note,  that  at  Level  B,  an  inflation 
is  applied  even  without  overlap.  This  conservatively  mod¬ 
els  Level-A  inflations  at  Level  B  to  avoid  non-linear  con¬ 
straints.  This  completes  the  LP  variable  relations  needed 
to  describe  constraints  derived  from  measured  execution 
times. 

Constraint  Set  2,  The  linear  constraints  for  utilization  vari¬ 
ables  based  on  task  execution-time  data  with  CRPD  over¬ 
heads  added  are  as  follows. 


\/W  -1}:: 

uS  >vS(Wc,W) 

'ip  e  {l,...,m}  :: 

UB,p>KpiWB,p,W) 
Ulp>^B,piWB,p,W) 
Uip>Kp(WA,p,W) 
ulp>K,pm,p,w)+ii^ 
UZp>^a,p{Wa,p,W)  +  iI^ 


iW  e  -  1}  :: 

H  >  h'{W)  +  {Wc  -  W){h'{W  +  1)  -  h'{W)) 
h  >  H\W)  +  {Wc  -  W){H\W  +  1)  -  H'{W)) 

Schedulability  constraints.  To  fully  characterize  all  con¬ 
straints  on  utilizations  and  ways,  we  must  include  the 
schedulability  constraints  based  on  Expressions  (7)-(10). 
Expression  (10)  is  a  strict  inequality.  We  apply  a  small  de¬ 
crease,  e  =  10“^  to  its  right-hand  side  to  change  this. 

Constraint  Set  4,  The  linear  constraints  based  on  the 
schedulability  conditions  (7)-(10)  are  as  follows. 

ip  ::  Ul^  <  1 

ip  ::  Ul^  +  UE,j,  <  1 

m 

Ei^A^p  +  UEy+Uc  <m 

p=l 

m 

E  {^E,p  +  Ub,p)  +  {m-l)h  +  H  <m-€ 

P=1 

Linear  program  for  LLC  allocation.  Erom  Lemmas  1-3 
and  the  discussion  above,  we  have  the  following. 

LP  Allocation  Theorem,  An  allocation  scheme  that  pro¬ 
duces  the  minimum  Level-C  utilization  for  a  task  set  while 
maintaining  all  schedulability  conditions  can  be  determined 
by  solving  the  following  LP. 


Modeling  h  and  H.  To  construct  an  LP  that  applies  all 
schedulability  conditions  to  task  systems,  linear  constraints 
are  also  required  for  quantities  speciflc  to  Expression  (10). 
We  let  h  and  H  be  our  LP  variables  for  h'{W)  and  H'{W), 
respectively.  Our  constraints  for  these  variables  are  con¬ 
structed  in  a  similar  fashion  to  the  constraints  for  utiliza¬ 
tion  variables.  Values  for  h'{W)  and  H'{W)  are  determined 
from  measured  execution  times  for  each  integer  number  of 
ways  W  allocated  to  Level  C  after  inflation.  Linear  con¬ 
straints  are  then  constructed  from  adjacent  data  points. 

This  requires  h'{W)  and  H'{W)  to  be  non-convex  as 
well.  These  data  functions,  in  fact,  are  non-convex  under 
our  Execution  Time  Assumption.  Let  Th{W)  denote  the 
Level-C  task  with  highest  inflated  utilization  when  W  ways 
are  allocated  to  Level  C.  If  Th{W)  does  not  change  with  W, 


minimize  ^  {^A,p  +  ^B,p^  +  ^  ^ 

p=i 

subject  to:  Constraint  Sets  1-4 

Non-negativity  constraints  on  all  variables. 

The  objective  of  minimizing  total  Level-C  utilization  is 
used  here  as  a  greedy  heuristic  because  this  reduces  tardi¬ 
ness  bounds  for  Level-C  tasks  [6].  However,  this  objective 
function  serves  a  secondary  purpose.  Recall  from  our  dis¬ 
cussion  of  the  LP  variable  /  that  if  /  is  minimized,  then 
it  will  equal  f{x)  at  integer  values  of  x.  Minimizing  to¬ 
tal  Level-C  utilization  ensures  that  utilization  variables  re¬ 
flect  actual  system  utilization  values  determined  from  PETs 
when  LLC  area  variables  are  at  integer  values. 
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Approximations.  Under  certain  scenarios,  the  LP  above 
will  converge  to  integer  way  values  for  many  task  systems. 
Consider  the  LP  with  Constraint  Set  4  removed.  The  re¬ 
maining  constraints  on  Level-C  utilizations  from  Constraint 
Set  2  intersect  at  integer  way  values.  Level-C  utilization  is 
minimized  at  the  intersection  of  linear  constraints,  and  the 
LP  will  thus  converge  to  integer  values.  However,  the  way- 
parameter  values  that  minimize  Level-C  utilization  may  vi¬ 
olate  schedulability  conditions  (7),  (8),  or  (10).  In  this  sce¬ 
nario,  the  LP  with  Constraint  Set  4  may  not  converge  to  in¬ 
teger  way  values. 

If  the  program  solution  does  not  return  integer  values, 
we  can  round  way  values,  or  convert  the  LP  to  a  mixed- 
integer  LP  (MILP).  In  Sec.  5,  we  compare  schedulability 
for  rounded  LP-based  LLC  allocation  sizes  to  schedulabil¬ 
ity  for  MILP-based  LLC  allocation  sizes.  Note  that  non¬ 
integral  LP-based  LLC  allocation  sizes  are  not  necessarily 
guaranteed  to  be  nearest  to  integral  LLC  allocations  that  are 
schedulable  when  schedulable  allocations  exist.  As  shown 
in  Sec.  5,  however,  the  schedulability  loss  due  to  rounded 
LP-based  programming  is  fairly  small  in  many  cases. 

5  Evaluation 

We  now  discuss  experiments  we  conducted  to  assess  the  im¬ 
pact  of  our  LP-based  LLC  allocation  approach  on  task-set 
schedulability. 

Experimental  framework.  We  randomly  generated  task 
sets  and  determined  the  fraction  that  were  schedulable  on 
our  target  hardware  platform,  the  quad-core  ARM  Cortex 
A9  machine  mentioned  earlier,  the  LLC  of  which  has  16 
ways  and  16  colors.  To  determine  the  benefit  of  LP-based 
LLC  allocation  relative  to  other  alternatives,  we  compared 
our  approach  to  two  fixed  LLC  allocation  schemes.  We  call 
the  first  alternative  the  default  scheme  because  it  is  the  one 
considered  in  the  companion  paper  mentioned  earlier  [12]. 
For  any  task  set,  it  allocates  eight  ways  and  16  colors  (half 
the  LLC  space)  to  Level  C,  and  splits  the  remaining  LLC 
space  evenly  into  per-core  areas;  core  p’s  area  consists  of 
four  colors  and  four  ways  (1/8  the  LLC  space)  and  is  shared 
by  all  Level-A  and  -B  tasks  on  core  p.  We  call  the  second 
alternative  the  bypass  scheme.  Under  it,  all  Level-A  and  -B 
tasks  bypass  the  LLC  entirely  (they  have  a  zero-area  LLC 
allocation),  and  the  Level-C  tasks  share  the  entire  LLC  with¬ 
out  restriction.  This  scheme  is  refiective  of  the  intuition  that 
the  provisioning  of  Level-A  and  -B  tasks  might  be  so  con¬ 
servative  that  they  derive  almost  no  benefit  from  the  LLC. 

We  consider  both  the  original  LP  formulation  of  our  ap¬ 
proach,  where  the  returned  ways  must  be  rounded  if  non¬ 
integral,  and  the  corresponding  MILP  formulation.  We  com¬ 
pare  these  two  formulations  both  in  terms  of  accuracy  and 
runtime  performance. 

Task-set  categories.  Our  schedulability  study  consisted  of 
81  separate  experiments,  each  pertaining  to  a  distinct  cate¬ 
gory  of  task  sets.  For  each  experiment,  task  sets  were  gen¬ 
erated  first  for  the  bypass  scheme  and  then  per-task  execu¬ 
tion  times  were  altered  to  obtain  corresponding  task  sets  for 


Type 

A 

B 

C 

Level-C 

C-heavy 

[10,  30) 

[10,  30) 

remainder 

Util. 

B -heavy 

[20,  30) 

[40,  60) 

remainder 

Alloe.  (%) 

AB -moderate 

[35,  45) 

[35,  45) 

remainder 

Light 

{3,6} 

{6,12} 

[3,33) 

Period  (ms) 

Contrasting 

{96,192} 

[10,100) 

Heavy 

{48,96} 

{96,192} 

[50,250) 

Light 

[0.001,0.03) 

[0.001,0.05) 

[0.001,0.1) 

Task  Util. 

Moderate 

[0.02,0.1) 

[0.05,0.2) 

[0. 1,0.4) 

Heavy 

[0.1,0.3) 

[0.3,0.5) 

[0.5,0.9) 

ws 

Light 

[0.01,0.1) 

[0.01,0.1) 

[0.01,0.1) 

Load 

Moderate 

[0.1,0.25) 

[0.1,  0.25) 

[0.1,0.25) 

Time 

Heavy 

[0.25,  0.5) 

[0.25,  0.5) 

[0.25,  0.5) 

Table  1 :  Task  set  categories. 

the  other  considered  schemes.  Task  sets  were  generated  for 
the  bypass  scheme  by  first  selecting  the  distributions  to  use 
in  generating  task  parameters.  These  distribution  choices, 
which  are  listed  in  Table  1,  are  as  follows. 

•  Selection  1:  Choose  the  distributions  to  use  in  deter¬ 
mining  the  fraction  of  the  overall  Level-C  utilization 
that  is  consumed  at  each  criticality  level.  There  are 
three  overall  choices  here,  as  shown  in  Table  1.  For 
example,  under  the  C-heavy  choice,  the  Level-C  uti¬ 
lization  of  each  of  Levels  A  and  B  will  be  between 
10%  and  30%  (exclusive)  of  the  total  Level-C  utiliza¬ 
tion,  with  the  remainder  going  to  Level  C. 

•  Selection  2:  Choose  the  distributions  to  use  in  generat¬ 
ing  task  periods.  Again,  there  are  three  overall  choices 
here,  as  shown  in  the  table. 

•  Selection  3:  Choose  the  distributions  to  use  in  gener¬ 
ating  Level-C  utilizations  for  individual  tasks.  Again, 
there  are  three  overall  choices. 

•  Selection  4:  Choose  the  distributions  to  use  in  deter¬ 
mining  the  time  required  to  load  a  task’s  working  set 
(WS)  from  memory.  The  load  time  is  expressed  as  a 
percentage  of  the  task’s  Level-C  execution  time.  As  be¬ 
fore,  there  are  three  overall  choices,  as  shown  in  the  ta¬ 
ble.  For  example,  under  the  Light  choice,  the  load  time 
for  any  task  will  be  between  1%  and  10%  (exclusive) 
of  its  overall  Level-C  execution  time. 

Generating  task  sets.  In  generating  task  sets  for  the  bypass 
scheme,  we  allowed  the  total  Level-C  utilization  to  vary 
from  0.1  to  6.1  in  steps  of  0.2.  For  each  total  Level-C  uti¬ 
lization  in  this  range,  we  evaluated  between  100  and  2,000 
randomly  generated  task  sets  to  estimate  mean  schedulabil¬ 
ity  with  95%  confidence  to  within  a  confidence  interval  of 
0.05.  Each  individual  task  set  was  generated  by  randomly 
selecting  relevant  parameters  using  the  distributions  cho¬ 
sen  above.  If  a  given  task  is  a  Level-A  (Level-B)  task,  then 
it  also  requires  a  Level-A  and  -B  (Level-B)  utilization.  A 
task’s  Level-B  utilization  (if  required)  was  defined  to  be 
s  times  its  Level-C  utilization,  where  the  scaling  factor  s 
ranges  uniformly  within  [10/3,  20/3].  This  choice  of  scal¬ 
ing  factor  was  based  on  measurement  data  from  our  ARM 
platform.  A  task’s  Level-A  utilization  (if  required)  was  de¬ 
fined  to  be  1.5  times  its  Level-B  utilization.  This  refiects 
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(a)  Measured  PETs. 


Figure  7:  Measured  vs.  generated  PETs  with  respect  to  ways. 

the  previously  mentioned  50%  inflating  of  Level-B  PETs  to 
obtain  Level- A  PETs.  Each  task’s  per-level  PETs  are  im¬ 
plicitly  determined  by  its  period  and  per-level  utilizations. 
Given  these  PETs,  and  the  WS  load  times  selected  using 
the  distribution  discussed  under  Selection  4  above,  we  de¬ 
termined  a  task’s  actual  WS  size  (WSS)  using  documented 
memory-access  latencies  for  our  ARM  machine.  WSSs  af¬ 
fect  how  PETs  vary  with  LLC-allocation  schemes,  which 
we  discuss  next. 

The  above  process  yields  a  task  set  for  the  bypass 
scheme.  To  obtain  a  corresponding  task  set  for  the  other 
schemes,  we  merely  have  to  scale  PETs  (and  hence  utiliza¬ 
tions)  to  reflect  allocated  EEC  areas.  The  scaling  factors 
we  used  in  determining  PETs  for  the  non-bypass  schemes 
were  based  on  measurement  data  obtained  from  our  ARM 
platform,  with  an  adjustment  applied  for  WSS-related  rea¬ 
sons.  In  particular,  a  task’s  WSS  determines  the  maximum 
amount  of  cache  space  it  uses.  As  we  allocate  additional 
EEC  space  for  a  task  beyond  its  WSS,  its  execution  time 
should  not  change  signiflcantly.  We  account  for  this  when 
determining  scaled  PETs. 

Eig.  7(a)  shows  some  of  the  measured  Level-B  PET 
data  we  collected  on  our  ARM  platform,  and  Eig.  7(b) 
shows  some  of  the  PETs  obtained  via  our  generation  pro¬ 
cess.  Note  that  our  generated  PETs  are  only  approximately 
non-convex.  To  apply  our  LP  techniques,  any  non-convexity 
must  be  masked  by  upper  bounding.  Such  upper  bounding 
introduces  pessimism  in  the  analysis. 

In  order  to  complete  the  speciflcation  of  a  task  set,  its 
Level- A  and  -B  tasks  must  be  assigned  to  cores.  We  ob¬ 
tained  such  an  assignment  by  using  the  worst-flt-decreasing 
bin-packing  heuristic  to  first  assign  Level-A  tasks,  based 
on  their  Level-A  utilizations  under  the  bypass  scheme,  and 


then  to  assign  Level-B  tasks  using  the  remaining  capacities, 
based  on  the  Level-B  utilizations  under  the  bypass  scheme. 

This  concludes  our  overview  of  the  task- set  generation 
process  we  used.  This  process  is  described  in  much  greater 
detail  in  an  appendix. 

Results.  Our  study  resulted  in  81  schedulability  plots. 
Due  to  space  constraints,  we  discuss  only  the  plots 
shown  in  Eig.  8,  which  reflect  generally  seen  trends 
across  all  plots;  the  other  plots  can  be  found  in  an  on¬ 
line  appendix  (available  at  http  :  /  /www  .cs.unc.edu/ 
-anderson/papers  .  html).  In  insets  (a)-(c)  of  Eig.  8, 
schedulability  plots  are  given  for  three  categories  of  task 
systems;  these  plots  depict  the  fraction  of  the  generated  task 
systems  deemed  schedulable,  as  a  function  of  overall  Level- 
C  utilization  under  the  bypass  scheme,  for  each  considered 
EEC  allocation  method.  Insets  (d)-(f)  give  corresponding 
probability  distributions  for  the  number  of  allocated  ways 
at  each  level  under  a  MILP-based  allocation.  Eor  example, 
inset  (f)  indicates  that  for  the  task- system  category  consid¬ 
ered  in  inset  (c),  10-14  ways  tended  to  be  allocated  to  Level 
C,  3-7  to  Level  B,  and  3-7  to  Level  A.  We  make  the  follow¬ 
ing  observations  from  this  data. 

Obs.  1.  Using  MILP-  and  LP-based  EEC  allocations  signif¬ 
icantly  improved  schedulability  in  approximately  a  third  of 
the  tested  task- system  categories,  increasing  schedulability 
by  20-50%  in  some  cases,  and  by  a  factor  of  two  in  oth¬ 
ers.  Eor  the  other  categories,  only  moderate  improvements 
resulted. 

Eig.  8(a)  depicts  one  of  several  categories  that  exhibited 
significant  improvements.  Eig.  8(d)  suggests  that,  for  this 
category,  the  usage  of  LP  techniques  adapts  EEC  allocations 
to  account  for  the  high  CPRD  overheads  expected  in  this 
case.  As  seen,  little  to  no  EEC  cache  space  is  given  to  any 
level,  suggesting  that  CRPD  overheads  outweigh  any  per¬ 
formance  gains  provided  by  the  EEC.  Eig.  8(b)  depicts  one 
of  several  categories  that  yielded  only  mild  improvements. 
Eig.  8(e)  suggests  that,  for  this  category,  the  usage  of  LP 
techniques  results  in  EEC  allocations  that  vary  dramatically. 
The  low  impact  of  EEC  allocation  choice  on  schedulability 
is  not  surprising,  since  this  is  a  light  memory  utilization  task 
set,  and  therefore  task  utilizations  are  not  very  sensitive  to 
EEC  area  size. 

Obs.  2.  The  usage  of  LP-based  allocations  resulted  in  little 
to  no  degradation  in  schedulability  in  comparison  to  MILP- 
based  allocations  in  all  tested  task- system  categories. 

Insets  (a)-(c)  of  Pig.  8  show  very  little  difference  in 
schedulability  results  for  these  two  algorithms.  Due  to  the 
similarities  of  the  LP  and  MILP  algorithms,  both  produce 
similar  LLC-allocation  schemes  for  each  task  set.  These 
schemes  have  similar  effects  on  schedulability  as  a  result. 

Obs.  3.  While  MILPs  have  exponential  time  complexity, 
the  actual  runtime  performance  of  our  MILP  allocation 
scheme  was  roughly  equivalent  to  that  of  our  LP  scheme. 

Across  all  task  systems  in  all  experiments,  our  MILP 
scheme  took  151  ms  on  average  and  1377  ms  in  the  worst 
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(d)  Figure  (a)  allocation.  (e)  Figure  (b)  allocation.  (f)  Figure  (c)  allocation. 


Figure  8:  Schedulability  and  LP-based  LLC  allocation  results  for  three  categories  of  task  sets.  Category  choices  are  listed  in  order  of 
categories. 


case,  while  or  LP  scheme  took  148  ms  on  average  and  315 
ms  in  the  worst  case. 

6  Conclusion 

To  our  knowledge,  this  is  the  first  paper  to  consider  the  op¬ 
timization  problem  of  allocating  LLC  areas  among  tasks  of 
different  criticality  levels  in  a  mixed-criticality  multicore 

2 

system.  We  addressed  this  problem  in  the  context  of  MC 
through  the  use  of  LP  techniques  that  take  into  account  both 
schedulability  and  CRPD  overheads.  We  demonstrated  the 
efficacy  of  these  techniques  by  presenting  an  experimen¬ 
tal  evaluation  that  shows  that  their  usage  can  have  signifi¬ 
cant  benefits  from  a  schedulability  viewpoint.  Our  LP  tech¬ 
niques  achieved  similar  schedulability  improvements  as  our 
MILP  variant.  In  our  experiments,  the  LP  and  MILP  vari¬ 
ants  proved  to  have  similar  runtime  overheads. 

In  the  LLC  allocation  problem  considered  herein,  only 
the  number  of  allocated  ways  is  viewed  as  a  variable.  The 
number  of  allocated  colors  (which  determine  the  allocated 
sets)  can  be  varied  as  well.  However,  varying  both  parame¬ 
ters  creates  an  optimization  problem  that  is  difficult  to  ad¬ 
dress  using  LP  techniques.  Nonetheless,  this  more  general 
optimization  problem  warrants  further  study. 
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Appendix  A 


We  now  define  all  remaining  values  of  F^(l^)  as  follows: 


In  this  appendix  we  describe  our  process  for  generating 
LLC-area  dependencies  for  task  utilization  at  each  critical¬ 
ity  level.  We  start  by  generating  task  systems  with  a  single 
execution  time  cost,  r.cost,  for  each  task.  To  simulate  LLC- 
area  dependencies,  we  applied  a  separate  non-increasing, 
non-concave  function,  over  the  number  of  allo¬ 

cated  ways  W  to  the  utilization  of  task  r  for  criticality  level 


t 


n^COSt 

uKw)  = 


_ £ 

We  also  have  an  upper-bounding  function  > 

FI{W)  that  is  non-convex.  This  upper-bounding  function 
is  used  for  our  linear  programs. 

Task  execution  times  may  have  varying  sensitivities  to 
LLC  area  size  based  on  a  task’s  memory  use  behavior.  To 
describe  more  precisely  different  categories  of  LLC  area 
size  dependencies,  we  must  first  describe  how  the  function 
T!-{W)  was  generated.  We  start  by  generating  a  non-convex 
function  F^{W)  for  each  task  at  criticality  level  £,  where 
Fi{0)  =  1.0,  where  i  is  Level  B  or  C.  The  Level-A  function 
is  generated  last  using  our  Level-B  function,  since  Level-A 
utilizations  should  be  50%  greater  than  Level-B  costs  for 
each  number  of  ways  W. 

To  constrain  the  initial  decrease  in  the  function  from 
ways  0  to  1 ,  we  define  parameters  <  1  and  for 

each  criticality  level  £.  Letting  rand{x^  y)  denote  a  function 
that  returns  a  random  value  in  the  interval  [x^y],  we  define 


F/(1)  =rand{b\D^). 


We  assume  that,  prior  to  overhead  inflation,  task  execution 
times  should  not  increase  as  LLC-allocation  size  increases. 
Hence,  we  ensure  F/  is  non-increasing  by  upper-bounding 

each  value  F/(VL)  where  W  >  1  by  F^(IL),  defined  as 
follows: 

For  the  function  to  be  non-convex,  each  value  Fl{W) 
where  W  >  2  must  be  lower-bounded  by  another  func¬ 
tion  F_l{W).  This  function  is  determined  by  taking  the  de¬ 
crease  in  function  value  from  W  —  2  to  W  —  1-  that  is, 
Ff{W  —  2)  —  FI{W  —  1)  -  and  ensuring  that  the  decrease 
from  IL  —  1  to  IL  is  not  greater 

fI{W)  =  F^{W  -  1)  -  (F/(VF  -  2)  -  F/(VF  -  1)). 

We  may  additionally  want  to  limit  these  bounds  further  such 
that  the  function  “fiattens  out”  slower  or  faster.  We  use  pa¬ 
rameters  uj  <  1  and  <  uj^  to  define  more  restricted 
bounds  F^W)  and  F^W).  The  parameters  and  uj^ 
are  used  to  calculate  interpolations  between  the  values  of 
F/(IL)andF/(IL) 

FRW)  ^  FiiW)  +  cb\Fl{W)  -  F^{W)), 
pyw)  ^  FiiW)  +  cb\Fl{W)  -  FiiW)). 


F^W)  =  rand{Fl{W),  fI{W)),  W  >  1. 

Ff{W)  is  derived  from  F^iyV)  by  considering  two  modifi- 
cations  to  the  curves  generated  so  far. 

Modificaton  1.  These  curves  have  no  lower  bound.  Our 
scaling  function  for  a  task’s  utilization  should  have  a  lower 
bound  (we  would  not  expect  the  LLC  to  ever  reduce  utiliza¬ 
tions  by  95%,  for  instance).  For  each  criticality  level,  we 
define  a  lower  bound  on  utilization  scaling 

Modification  2.  Each  task  r  has  a  WSS  t.wss.  As  stated 
in  Sec.  5,  we  expect  execution  times,  and  thus  utilizations, 
to  not  decrease  significantly  as  a  task  is  allocated  addition 
LLC  space  beyond  its  WSS.  We  denote  to  be  the  least 
number  of  ways  required  for  a  task  to  fit  its  entire  working 
set  into  the  allocated  area.  This  completes  the  functions  and 
parameters  we  need  to  derive  a  non-convex  function  with 
which  to  upper-bound  task  utilizations.  For  Level  B,  we  de¬ 
fine 


F{W)  =  max{Fi^{W),  for  W  <  Wr^ 

J-f  =  J-f  -  1) ,  foxW> 


At  each  level,  we  want  r.cost  for  r  G  F^  to  represent 
the  bypass-scheme  utilization  at  level  £.  Remember,  in  this 
scheme.  Level  B  is  given  0  ways  on  all  cores  and  Level  C  is 
given  ways.  Hence,  for  G  F^,  Ff{0)  should  be 

1.0,  but  for  Ti  G  Fc,  Ff  should  be  1.0.  For  Level 

C,  we  first  define  a  helper  function  (W)  derived  in  a 

_ ^ 

similar  fashion  as  (W)  and  then  normalize  this  function 
with  respect  to  Mf 


Mf{W)  =  max{Ff{W),FgiJ, 

Vff  {W)  =  max{Ff{W),Fgi^),  for  W  > 

M?(W) 


Ff{W)  is  derived  by  applying  slight  variations  to  val- 

_ £ 

ues  of  FpVF).  We  randomly  pick  up  to  eight  different  way 

values  {VFi,  ...kFtot},  Wtot  <  8  between  one  and  fifteen. 

_ £  _ £ 

We  determine  the  range  R  =  F^O)  — over  which 
the  non-convex  function  varies,  and  for  each  way  value 
W  G  {VFi,  ...Wtot]^  we  assign  F/(1L)  the  following: 

Fl{W)  =  f\{W)  -  rand{<d,  0.05)  •  R 

For  all  other  way  values,  F/(1L)  =  f\{W). 

At  this  point,  we  have  a  unique  set  of  parameters 


=  {b^  ^  ^Fmin^} 


for  generating  worst-case  execution  time  behavior  at  Level 
B  and  a  similar  set  of  parameters 
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for  describing  average-case  execution  time  behavior  at 
Level  C.  For  our  paper,  we  chose  and  values  in 

the  range  [0.9,0.97)  to  produce  utilizations  that  initially  de¬ 
clined  steadily  as  ways  increase  from  0.  We  chose  and 
in  the  range  [0,  0.15)  to  ensure  initially  steady  declines 
in  utilization  tend  to  not  flatten  out  as  way  sizes  increase. 
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